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What  is  QSAR? 


Quantitative  Structure-Activity  Relationship 


A  technique  used  to  quantify  differences  between  biological  activity 

and  that  of  a  molecular  structure 


Molecular  I  ^ 
Structure  1 - ■— 


Representation 


Properties 

or 

Activities 


Structural 

Descriptors 


Model 

Design 


There  are  guidelines/rules  to  this  approach 

1 .  Choose  well-defined  Activity  endpoints 

2.  Choose  plausible  molecular  descriptors 

3.  Explore  the  data  with  statistics 

4.  Test  hypotheses  with  new  data  (ie.  iterate) 
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QSAR  Overview 
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Organophosphate  Structures 
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Physiochemical  Descriptors 


Cost 

Complexity 
Computing  Power 


AMPAC  &  CODESSA 


Constitutional  Descriptors 


Reflect  molecular  composition  of  compound  without 
using  geometry  or  electronic  structure  of  molecule: 

-  Number  of  atoms 

-Absolute  and  relative  numbers  of  C,  H,  O,  S,  N,  F,  Cl,  Br,  I,  P 
atoms 

-  Number  of  bonds 

-Absolute  and  relative  numbers  of  single,  double,  triple  and 
aromatic  bonds 

-  Number  of  rings 

-  Number  of  rings  divided  by  the  number  of  atoms,  number  of 
benzene  rings,  number  of  benzene  rings  divided  by  the 
number  of  atoms 


-  Molecular  and  average  atomic  weight 


4  1  — 

Topostructural  Descriptors 

/ 


A  molecular  graph  is  made  up  of  Edges  and  Vertices 
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Adjacency  matrix  (A) 
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Many  topostructural  indices  can  be  derived  from  matrices  A  and  D 


Regression  Techniques 


Linear  Regression 
Examples 

•  Heuristic 

•  Partial  Least  Squares  (PLS) 

•  Principle  Component  Regression 
(PCR) 

•  Orthogonal  Projection  to  Latent 

Structures  (OPLS) 

•  Ridge  Regression 


Non-Linear  Regression 
Examples 

•  Support  Vector  Machines  (SVM) 

•  Neural  Networks  (NN) 

•  Kernel  Orthogonal  Projection  to  Latent 
Structures  (KOPLS) 

•Kernel  Partial  Least  Squares  (KPLS) 


Clustering  Regression 
Examples 


•k-nearest  neighbor 
•Random  Forest 
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Regression  Techniques 


Overview  of  orthogonal  projection  to  latent  structures 
|  (O-PLS) 


Original  data  set 


Orthogonal  data  set 


•  Evaluate  orthogonal  variation 
in  principal  components 

•  Identify  source  of  orthogonal 
variation 


OPLS 


OPLS  treated  data 


Harder  to  interpret 
More  PLS  components 
Orthogonal  variation  in  X 


Easier  to  interpret 
Fewer  components 
More  relevant 


Adapted  from  Trygg  and  Wold  2002 
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•  Acetylcholinesterase  Bimolecular 
Rate  Constants  (Mimin'1) 


Table  1.  Percentage  of  database  that  represents  a  particular  temperature  and 
species. 


OP  Library 


- 

1 

1 

1 

1 

1 

- 

■ 

■ 

- 

1 

1 

1 

1 

1 

- 

- 

1 

- 

+ 

OP  Compounds 


Data  collected  from  69 
peer-reviewed  journal  articles. 


Temperature  (°C) 

25 

37 

Unknown  5 

30 

22 

27 

Species 

Total  Percent 

Human 

28.46 

6.93 

8.15 

0.00 

0.00 

0.00 

0.19 

43.72 

Bovine 

21.44 

3.56 

0.09 

0.09 

1.22 

0.00 

0.00 

26.40 

Unknown 

7.77 

0.28 

1.50 

0.00 

0.00 

0.00 

0.00 

9.55 

Fly 

2.25 

0.56 

4.87 

0.00 

1.03 

0.00 

0.00 

8.71 

Rat 

0.00 

1.22 

0.47 

0.66 

0.00 

0.47 

0.00 

2.81 

Hen 

0.56 

0.47 

0.00 

0.00 

0.00 

0.47 

0.00 

1.50 

Rabbit 

0.47 

0.00 

0.00 

0.00 

1.03 

0.00 

0.00 

1.50 

Eel 

0.75 

0.75 

0.00 

0.00 

0.00 

0.00 

0.00 

1.50 

Cricket 

0.66 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.66 

Guinea  Pig 

0.19 

0.00 

0.00 

0.00 

0.00 

0.47 

0.00 

0.66 

Pig 

0.00 

0.66 

0.00 

0.00 

0.00 

0.00 

0.00 

0.66 

Mouse 

0.19 

0.00 

0.00 

0.00 

0.09 

0.28 

0.00 

0.56 

NHP 

0.00 

0.00 

0.00 

0.00 

0.00 

0.47 

0.00 

0.47 

Catfish 

0.00 

0.00 

0.00 

0.00 

0.00 

0.47 

0.00 

0.47 

Frog 

0.00 

0.00 

0.00 

0.00 

0.00 

0.47 

0.00 

0.47 

Minipig 

0.00 

0.37 

0.00 

0.00 

0.00 

0.00 

0.00 

0.37 

Total  Percent 

62.73 

14.79 

15.07 

0.75 

3.37 

3.09 

0.19 

100.00 

1068  observations 
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s.  A  Acetylcholinesterase  Bimolecular 


Rate  Constants  (Mimin'1) 


Global  Human  Orthogonal-PLS 


Monte  Carlo/Bootstrap 
Cross-Validation 
“Leave-random-number-out” 
Consensus  QSAR  Predictions 


Global  training  R2=0.91  using  74  significant 
and  uncorrelated  descriptors. 


A  mean  training  R2  of  0.77±0.02  and 
an  external  test  set  Q2  of  0.64±0.10 
was  achieved  using  the  significant 
uncorrelated  descriptors.  Y- 
randomization  Q2=-0.23±0.18. 
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Domain  of  Applicability 


•  A  number  of  techniques  exist  to  quantify  the  Domain  of 
Applicability. 

QSAR  model  predictions  are  only 
valid  within  the  applicability  domain. 


— i - 1 - 1 - 1 — 

o  o 


0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

Hat  diagonal 


If  your  test  compound  falls  within 
the  DOA  then  you  can  expect  a 
reliable  prediction. 


^  Compounds  with  high  leverage 

can  heavily  influence  a  model. 
Predicted  responses  outside  of 
the  warning  leverage  may  not  be 
reliable. 

Possible  outliers 


AChE  Descriptor  Significance 


Descriptor  Name 

Normalized  P 
value 

Avg  nucleoph.  react,  index  for  a  C  atom 

1.000 

HOMO  energy 

0.995 

Min  nucleoph.  react,  index  for  a  0  atom 

0.991 

Max  nucleoph.  react,  index  for  a  C  atom 

0.947 

Max  n-n  repulsion  for  a  C-H  bond 

0.889 

(1/6)X  GAMMA  polarizability  (DIP) 

0.883 

IX  GAMMA  polarizability  (DIP) 

0.883 

Max  e-n  attraction  for  a  C-H  bond 

0.831 

HOMO  -  LUMO  energy  gap 

-0.827 

ESP-Max  net  atomic  charge  for  a  F  atom 

-0.827 

17 


Regression  Techniques 


•  Heuristic  (Built  into  CODESSA  2.51) 

-  Pre-selection  of  descriptors  based  upon  a  series  of  criteria  cutoffs 

•  Variation  in  descriptors 

•  F-test 

•  R2 

•  T-value 

•  Inter-correlation 

-  F-test  measures  significance  of  the  whole  model,  t-test  reflects 
significance  of  the  parameter. 


Butyrylcholinesterase 
“Serum  Cholinesterase” 


TSS  5  (410  structures) 

R2=0.8235  F=71.69  s2=0.7435  (25  descriptors) 


Bimolecular  Rate 


F=71.69,  410  compounds 


Inhibition  of  AChE  and  BChE  in  Blood 


1.2  - 


Inhalation  VX  Minipig  -  0.046  mg/m  (#28) 


O 


0.6  - 


0.4  - 


0.0 


.  BChE 

▲ 

AChE 

• 

i  i 

2  4 

Time  (hr) 

i 

6 

Data  taken  from  literature 


Noncholinergic  Targets 


Other  serine 
hydrolases 
Various  actions 


Carboxylesterases 

Amidases 

Toxicity  interactions 


APH 

neuropeptide 

metabolism 


AFMID 

teratogenesis 


BChE,  mAChR 
Cholinergic 
interactions 


NTE-LysoPLA 

Delayed 

neurotoxicity 


FAAH,  CB1 
Cannabinoid 
interactions 


Noncholinergic  Targets 

Digestive  Proteases 


Trypsin 


Experimental 

R2=0.94,  Q2=0.90, 

52  structures,  10  descriptors 


Experimental 

R2=0.92,  Q2=0.87, 

62  structures,  10  descriptors 


Ruark  et  al.  2011 
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External  Validation 


Table  5.  Trypsin  results  from  the  external  validation  using  the  ABC  approach. 


Training 

set 

Number  of 

compounds 

R2 

Q2 

F 

s2 

Test  set 

Number  of 

compounds 

R  test 

RMSEtest 

A  +  B 

35 

0.95 

0.88 

48.46 

0.10 

C 

17 

0.85 

0.46 

A  +  C 

35 

0.94 

0.91 

77.02 

0.11 

B 

17 

0.59 

0.70 

B  +  C 

34 

0.91 

0.84 

30.89 

0.15 

A 

18 

0.82 

0.56 

Average 

34.67 

0.93 

0.88 

52.12 

0.12 

Average 

17.33 

0.75 

0.57 

Table  6.  a-Chymotrypsin  results  from  the  external  validation  using  the  ABC  approach. 


Training 

Set 

Number  of 

Compounds 

R2 

Q2 

F 

s2 

Test 

Set 

Number  of 

Compounds 

R2.est 

RMSEtest 

A  +  B 

42 

0.86 

0.79 

26.24 

0.66 

c 

20 

0.86 

0.91 

A  +  C 

41 

0.90 

0.63 

34.21 

0.49 

B 

21 

0.81 

1.13 

B  +  C 

41 

0.68 

0.58 

14.55 

1.36 

A 

21 

0.16 

1.89 

Average 

41.33 

0.81 

0.67 

25.00 

0.84 

Average 

20.67 

0.61 

1.31 

R2=Coefficient  of  determination. 

Q2=Cross-validated  LOO  R2. 

F=Fisher  F-test. 

s2=Mean  squared  error,  s2  =  —  Yio )  *  ( Yic  —  Yio))/(Ns  -  Nd  -  1)  where  Yic  is  the  ith  calculated/predicted  property 

value,  Yio  is  the  ith  observed/input  property  value,  Ns  is  the  number  of  training  structures,  Nd  is  the  number  of  descriptors  and  the  sum 
runs  from  1  to  Ns. 

RMSE:  Root  mean  standard  error. 

Ruark  et  al.  2011 


Trypsin  Descriptors 


Ruark  et  al.  2011 


Descriptor 

Code 

Descriptor  Name 

T-test 

(Global  training 

set) 

T-test 

(AB  training 

set) 

T-test 

(AC  training 

set) 

T-test 

(BC  training 

set) 

NAa 

Error 

4.40 

3.23 

^-6.87^ 

^0.77^ 

D, 

Number  of  F  atoms 

^7.82^ 

6.63 

(  12.65  J 

(  4.92  J 

d2 

Kier  shape  index  (order  2) 

(  9-83  j 

(  8.09  J 

V 

8.12 

^ - ^ 

4.08 

d3 

RNCG  Relative  negative  charge  (QMNEG/QTMINUS) 

[Zefirov’s  PC] 

-0.84 

^ - / 

0.10 

8.33 

1.80 

d4 

Kier&Hall  index  (order  3) 

-7.49 

-5.06 

-6.66 

-3.80 

d5 

Balaban  index 

-3.37 

b 

-2.65 

-3.40 

d6 

PPSA-3  Atomic  charge  weighted  PPSA  [Zefirov’s  PC] 

-5.81 

-4.75 

b 

-3.40 

d7 

Number  of  0  atoms 

-5.24 

-3.92 

b 

b 

d8 

Relative  number  of  H  atoms 

-5.11 

-5.17 

b 

1.03 

d9 

FPSA-1  Fractional  PPSA  (PPSA-1/TMSA)  [Zefirov’s 

PC] 

3.66 

3.10 

b 

b 

Dio 

Kier  shape  index  (order  3) 

2.00 

2.08 

-1.23 

-0.49 

Conclusion 


1.  QSAR  can  be  used  to  predict  organophosphate  oxon  bimolecular 
rate  constants  for  AChE,  BChE,  trypsin  and  chymotrypsin. 

2.  Approach  can  be  applied  to  other  PBPK/PD  modeling  parameters. 

3.  QSAR  descriptors  can  provide  a  mechanistic  description  of  the 
enzymatic  reactions. 

Steric  hindrance,  connectivity,  lipophilicity,  electrophilicity, 
electrostatics,  hydrogen  bonding,  van  der  Waals 
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